Overview

Dataset statistics

Number of variables26
Number of observations100000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.8 MiB
Average record size in memory208.0 B

Variable types

CAT16
NUM7
BOOL2
DATE1

Warnings

crash_time has a high cardinality: 1440 distinct values High cardinality
location has a high cardinality: 44605 distinct values High cardinality
on_street_name has a high cardinality: 4328 distinct values High cardinality
combine_location has a high cardinality: 44606 distinct values High cardinality
nearest_street has a high cardinality: 27727 distinct values High cardinality
number_of_motorist_injured is highly correlated with number_of_persons_injuredHigh correlation
number_of_persons_injured is highly correlated with number_of_motorist_injuredHigh correlation
crash_year is highly correlated with collision_idHigh correlation
collision_id is highly correlated with crash_yearHigh correlation
collision_id has unique values Unique
number_of_persons_injured has 72699 (72.7%) zeros Zeros
number_of_pedestrians_injured has 95454 (95.5%) zeros Zeros
number_of_motorist_injured has 81887 (81.9%) zeros Zeros

Reproduction

Analysis started2020-12-13 10:03:11.879583
Analysis finished2020-12-13 10:06:38.830317
Duration3 minutes and 26.95 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Distinct551
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Minimum2013-03-23 00:00:00
Maximum2020-09-29 00:00:00
2020-12-13T11:06:39.332192image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:40.175085image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

crash_time
Categorical

HIGH CARDINALITY

Distinct1440
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
0:00
 
1637
17:00
 
1363
16:00
 
1360
14:00
 
1298
15:00
 
1246
Other values (1435)
93096 
ValueCountFrequency (%) 
0:0016371.6%
 
17:0013631.4%
 
16:0013601.4%
 
14:0012981.3%
 
15:0012461.2%
 
18:0012311.2%
 
13:0011531.2%
 
12:0011031.1%
 
19:009961.0%
 
10:009711.0%
 
Other values (1430)8764287.6%
 
2020-12-13T11:06:40.994696image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:06:41.663761image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length5
Mean length4.74399
Min length4

borough
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
BROOKLYN
32585 
QUEENS
27781 
BRONX
18899 
MANHATTAN
17695 
STATEN ISLAND
 
3040
ValueCountFrequency (%) 
BROOKLYN3258532.6%
 
QUEENS2778127.8%
 
BRONX1889918.9%
 
MANHATTAN1769517.7%
 
STATEN ISLAND30403.0%
 
2020-12-13T11:06:42.229557image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:06:42.622199image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:43.157453image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length13
Median length8
Mean length7.20636
Min length5

zip_code
Real number (ℝ≥0)

Distinct203
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10876.94022
Minimum10000
Maximum11697
Zeros0
Zeros (%)0.0%
Memory size781.2 KiB
2020-12-13T11:06:43.815765image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum10000
5-th percentile10013
Q110455
median11208
Q311249
95-th percentile11429
Maximum11697
Range1697
Interquartile range (IQR)794

Descriptive statistics

Standard deviation533.7490418
Coefficient of variation (CV)0.04907161674
Kurtosis-1.371981632
Mean10876.94022
Median Absolute Deviation (MAD)206
Skewness-0.5167109366
Sum1087694022
Variance284888.0396
MonotocityNot monotonic
2020-12-13T11:06:44.569394image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1120722202.2%
 
1121215631.6%
 
1138515391.5%
 
1120815261.5%
 
1123614601.5%
 
1143413761.4%
 
1136813371.3%
 
1120313031.3%
 
1123412961.3%
 
1045712201.2%
 
Other values (193)8516085.2%
 
ValueCountFrequency (%) 
1000020< 0.1%
 
100017780.8%
 
100028020.8%
 
100035260.5%
 
100041450.1%
 
ValueCountFrequency (%) 
1169727< 0.1%
 
116951< 0.1%
 
116941740.2%
 
116932030.2%
 
116921820.2%
 

location
Categorical

HIGH CARDINALITY

Distinct44605
Distinct (%)44.6%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
unspecified
 
8204
(40.861862, -73.91282)
 
79
(40.8047, -73.91243)
 
55
(40.820305, -73.89083)
 
52
(40.696033, -73.98453)
 
48
Other values (44600)
91562 
ValueCountFrequency (%) 
unspecified82048.2%
 
(40.861862, -73.91282)790.1%
 
(40.8047, -73.91243)550.1%
 
(40.820305, -73.89083)520.1%
 
(40.696033, -73.98453)48< 0.1%
 
(40.675735, -73.89686)48< 0.1%
 
(40.658577, -73.89063)47< 0.1%
 
(40.737785, -73.93496)43< 0.1%
 
(40.733536, -73.87035)41< 0.1%
 
(40.66496, -73.82226)40< 0.1%
 
Other values (44595)9134391.3%
 
2020-12-13T11:06:45.561558image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique29003 ?
Unique (%)29.0%
2020-12-13T11:06:46.213938image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length25
Median length22
Mean length20.85131
Min length11

on_street_name
Categorical

HIGH CARDINALITY

Distinct4328
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
unspecified
26009 
BELT PARKWAY
 
1616
LONG ISLAND EXPRESSWAY
 
1053
BROOKLYN QUEENS EXPRESSWAY
 
956
BROADWAY
 
863
Other values (4323)
69503 
ValueCountFrequency (%) 
unspecified2600926.0%
 
BELT PARKWAY 16161.6%
 
LONG ISLAND EXPRESSWAY 10531.1%
 
BROOKLYN QUEENS EXPRESSWAY 9561.0%
 
BROADWAY 8630.9%
 
FDR DRIVE 8520.9%
 
GRAND CENTRAL PKWY 8200.8%
 
ATLANTIC AVENUE 7170.7%
 
MAJOR DEEGAN EXPRESSWAY 6740.7%
 
CROSS BRONX EXPY 6520.7%
 
Other values (4318)6578865.8%
 
2020-12-13T11:06:46.913258image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1369 ?
Unique (%)1.4%
2020-12-13T11:06:47.528810image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length32
Median length32
Mean length26.53811
Min length11

number_of_persons_injured
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.37196
Minimum0
Maximum15
Zeros72699
Zeros (%)72.7%
Memory size781.2 KiB
2020-12-13T11:06:48.003065image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum15
Range15
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7439161865
Coefficient of variation (CV)1.999989748
Kurtosis16.5808926
Mean0.37196
Median Absolute Deviation (MAD)0
Skewness3.147118256
Sum37196
Variance0.5534112925
MonotocityNot monotonic
2020-12-13T11:06:48.494203image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%) 
07269972.7%
 
12101121.0%
 
241254.1%
 
313081.3%
 
45230.5%
 
51960.2%
 
6770.1%
 
736< 0.1%
 
814< 0.1%
 
95< 0.1%
 
Other values (3)6< 0.1%
 
ValueCountFrequency (%) 
07269972.7%
 
12101121.0%
 
241254.1%
 
313081.3%
 
45230.5%
 
ValueCountFrequency (%) 
151< 0.1%
 
113< 0.1%
 
102< 0.1%
 
95< 0.1%
 
814< 0.1%
 
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
0
99816 
1
 
176
2
 
7
3
 
1
ValueCountFrequency (%) 
09981699.8%
 
11760.2%
 
27< 0.1%
 
31< 0.1%
 
2020-12-13T11:06:49.069010image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
2020-12-13T11:06:49.437613image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:49.868382image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

number_of_pedestrians_injured
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04739
Minimum0
Maximum6
Zeros95454
Zeros (%)95.5%
Memory size781.2 KiB
2020-12-13T11:06:50.309333image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2234383296
Coefficient of variation (CV)4.714883512
Kurtosis38.41899762
Mean0.04739
Median Absolute Deviation (MAD)0
Skewness5.270026474
Sum4739
Variance0.04992468715
MonotocityNot monotonic
2020-12-13T11:06:50.761112image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
09545495.5%
 
143834.4%
 
21420.1%
 
317< 0.1%
 
62< 0.1%
 
51< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
09545495.5%
 
143834.4%
 
21420.1%
 
317< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
62< 0.1%
 
51< 0.1%
 
41< 0.1%
 
317< 0.1%
 
21420.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
0
99936 
1
 
64
ValueCountFrequency (%) 
09993699.9%
 
1640.1%
 
2020-12-13T11:06:51.126070image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
0
95147 
1
 
4744
2
 
107
3
 
2
ValueCountFrequency (%) 
09514795.1%
 
147444.7%
 
21070.1%
 
32< 0.1%
 
2020-12-13T11:06:51.511161image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:06:51.858855image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:52.295233image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
0
99975 
1
 
25
ValueCountFrequency (%) 
099975> 99.9%
 
125< 0.1%
 
2020-12-13T11:06:52.623376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

number_of_motorist_injured
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.27492
Minimum0
Maximum15
Zeros81887
Zeros (%)81.9%
Memory size781.2 KiB
2020-12-13T11:06:52.963640image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum15
Range15
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.711058401
Coefficient of variation (CV)2.586419326
Kurtosis21.76181987
Mean0.27492
Median Absolute Deviation (MAD)0
Skewness3.819224309
Sum27492
Variance0.5056040496
MonotocityNot monotonic
2020-12-13T11:06:53.454161image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%) 
08188781.9%
 
11224312.2%
 
237673.8%
 
312591.3%
 
45230.5%
 
51890.2%
 
6730.1%
 
734< 0.1%
 
814< 0.1%
 
95< 0.1%
 
Other values (3)6< 0.1%
 
ValueCountFrequency (%) 
08188781.9%
 
11224312.2%
 
237673.8%
 
312591.3%
 
45230.5%
 
ValueCountFrequency (%) 
151< 0.1%
 
113< 0.1%
 
102< 0.1%
 
95< 0.1%
 
814< 0.1%
 
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
0
99904 
1
 
89
2
 
6
3
 
1
ValueCountFrequency (%) 
09990499.9%
 
1890.1%
 
26< 0.1%
 
31< 0.1%
 
2020-12-13T11:06:54.019965image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
2020-12-13T11:06:54.388737image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:54.821130image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1
Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Driver Inattention/Distraction
25605 
Unspecified
25253 
Following Too Closely
7530 
Other_factor
6994 
Failure to Yield Right-of-Way
6023 
Other values (11)
28595 
ValueCountFrequency (%) 
Driver Inattention/Distraction2560525.6%
 
Unspecified2525325.3%
 
Following Too Closely75307.5%
 
Other_factor69947.0%
 
Failure to Yield Right-of-Way60236.0%
 
Backing Unsafely40334.0%
 
Passing or Lane Usage Improper39794.0%
 
Passing Too Closely36763.7%
 
Other Vehicular30713.1%
 
Unsafe Lane Changing25882.6%
 
Other values (6)1124811.2%
 
2020-12-13T11:06:55.371516image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:06:55.984740image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length30
Median length19
Mean length20.43398
Min length11
Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Unspecified
67739 
Other_factor
19861 
Driver Inattention/Distraction
 
5284
Following Too Closely
 
1296
Other Vehicular
 
1249
Other values (12)
 
4571
ValueCountFrequency (%) 
Unspecified6773967.7%
 
Other_factor1986119.9%
 
Driver Inattention/Distraction52845.3%
 
Following Too Closely12961.3%
 
Other Vehicular12491.2%
 
Passing or Lane Usage Improper8020.8%
 
Failure to Yield Right-of-Way7160.7%
 
Passing Too Closely5380.5%
 
Unsafe Lane Changing4020.4%
 
Unsafe Speed3830.4%
 
Other values (7)17301.7%
 
2020-12-13T11:06:56.662381image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:06:57.260473image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length53
Median length11
Mean length13.00979
Min length11
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Other_factor
91338 
Unspecified
 
8197
Following Too Closely
 
176
Other Vehicular
 
171
Driver Inattention/Distraction
 
118
ValueCountFrequency (%) 
Other_factor9133891.3%
 
Unspecified81978.2%
 
Following Too Closely1760.2%
 
Other Vehicular1710.2%
 
Driver Inattention/Distraction1180.1%
 
2020-12-13T11:06:57.805459image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:06:59.041292image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:59.584096image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length30
Median length12
Mean length11.96024
Min length11

collision_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct100000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4226109.341
Minimum2568
Maximum4353706
Zeros0
Zeros (%)0.0%
Memory size781.2 KiB
2020-12-13T11:07:00.284040image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2568
5-th percentile3665427.95
Q14182342.75
median4300224
Q34328315.25
95-th percentile4348345.05
Maximum4353706
Range4351138
Interquartile range (IQR)145972.5

Descriptive statistics

Standard deviation165356.0511
Coefficient of variation (CV)0.03912725341
Kurtosis45.22161792
Mean4226109.341
Median Absolute Deviation (MAD)51882.5
Skewness-3.965406795
Sum4.226109341e+11
Variance2.734262364e+10
MonotocityNot monotonic
2020-12-13T11:07:00.961695image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
43274231< 0.1%
 
43080231< 0.1%
 
41851401< 0.1%
 
43444121< 0.1%
 
43423651< 0.1%
 
43485101< 0.1%
 
43464631< 0.1%
 
41723841< 0.1%
 
43014091< 0.1%
 
43075541< 0.1%
 
Other values (99990)99990> 99.9%
 
ValueCountFrequency (%) 
25681< 0.1%
 
690101< 0.1%
 
742941< 0.1%
 
1277331< 0.1%
 
2105911< 0.1%
 
ValueCountFrequency (%) 
43537061< 0.1%
 
43537051< 0.1%
 
43537011< 0.1%
 
43536721< 0.1%
 
43536631< 0.1%
 
Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Sedan
46790 
Station Wagon/Sport Utility Vehicle
35766 
Other_code
7217 
Taxi
 
3478
Pick-up Truck
 
2615
Other values (3)
 
4134
ValueCountFrequency (%) 
Sedan4679046.8%
 
Station Wagon/Sport Utility Vehicle3576635.8%
 
Other_code72177.2%
 
Taxi34783.5%
 
Pick-up Truck26152.6%
 
Box Truck19461.9%
 
Bike14371.4%
 
Tractor Truck Diesel7510.8%
 
2020-12-13T11:07:01.587313image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:07:01.960598image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:07:02.590932image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length5
Mean length16.44119
Min length4
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Sedan
31369 
Other_code
31039 
Station Wagon/Sport Utility Vehicle
24773 
Bike
3586 
Taxi
 
2300
Other values (5)
6933 
ValueCountFrequency (%) 
Sedan3136931.4%
 
Other_code3103931.0%
 
Station Wagon/Sport Utility Vehicle2477324.8%
 
Bike35863.6%
 
Taxi23002.3%
 
Pick-up Truck22822.3%
 
Box Truck21462.1%
 
Bus10111.0%
 
Tractor Truck Diesel7630.8%
 
Motorcycle7310.7%
 
2020-12-13T11:07:03.183903image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:07:03.620121image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:07:04.358373image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length10
Mean length14.32417
Min length3
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Other_code
92036 
Sedan
 
4129
Station Wagon/Sport Utility Vehicle
 
3380
Pick-up Truck
 
195
Taxi
 
187
ValueCountFrequency (%) 
Other_code9203692.0%
 
Sedan41294.1%
 
Station Wagon/Sport Utility Vehicle33803.4%
 
Pick-up Truck1950.2%
 
Taxi1870.2%
 
Box Truck730.1%
 
2020-12-13T11:07:04.921945image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:07:05.285986image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:07:05.837095image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length10
Mean length10.63245
Min length4

crash_day
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
Friday
15494 
Tuesday
14653 
Thursday
14573 
Wednesday
14285 
Monday
14242 
Other values (2)
26753 
ValueCountFrequency (%) 
Friday1549415.5%
 
Tuesday1465314.7%
 
Thursday1457314.6%
 
Wednesday1428514.3%
 
Monday1424214.2%
 
Saturday1396414.0%
 
Sunday1278912.8%
 
2020-12-13T11:07:06.394643image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-13T11:07:06.789294image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:07:07.399410image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.14582
Min length6

crash_month
Real number (ℝ≥0)

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.02299
Minimum2
Maximum12
Zeros0
Zeros (%)0.0%
Memory size781.2 KiB
2020-12-13T11:07:07.917310image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q16
median7
Q38
95-th percentile9
Maximum12
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.828325706
Coefficient of variation (CV)0.2603343741
Kurtosis0.02879865261
Mean7.02299
Median Absolute Deviation (MAD)1
Skewness-0.2523065626
Sum702299
Variance3.342774888
MonotocityNot monotonic
2020-12-13T11:07:08.437633image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%) 
82695027.0%
 
72358923.6%
 
91322613.2%
 
61164611.6%
 
584268.4%
 
476797.7%
 
341334.1%
 
1136033.6%
 
125060.5%
 
101540.2%
 
ValueCountFrequency (%) 
2880.1%
 
341334.1%
 
476797.7%
 
584268.4%
 
61164611.6%
 
ValueCountFrequency (%) 
125060.5%
 
1136033.6%
 
101540.2%
 
91322613.2%
 
82695027.0%
 

crash_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2019.37856
Minimum2013
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size781.2 KiB
2020-12-13T11:07:08.916457image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2013
5-th percentile2017
Q12019
median2020
Q32020
95-th percentile2020
Maximum2020
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7793320217
Coefficient of variation (CV)0.0003859266594
Kurtosis3.639582223
Mean2019.37856
Median Absolute Deviation (MAD)0
Skewness-1.657671392
Sum201937856
Variance0.6073584
MonotocityNot monotonic
2020-12-13T11:07:09.385170image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
20205003750.0%
 
20194388143.9%
 
201758715.9%
 
20181480.1%
 
201543< 0.1%
 
201319< 0.1%
 
20141< 0.1%
 
ValueCountFrequency (%) 
201319< 0.1%
 
20141< 0.1%
 
201543< 0.1%
 
201758715.9%
 
20181480.1%
 
ValueCountFrequency (%) 
20205003750.0%
 
20194388143.9%
 
20181480.1%
 
201758715.9%
 
201543< 0.1%
 

combine_location
Categorical

HIGH CARDINALITY

Distinct44606
Distinct (%)44.6%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
(nan,nan)
 
8035
(0.0,0.0)
 
169
(40.861862,-73.91282)
 
79
(40.8047,-73.91243)
 
55
(40.820305,-73.89083000000001)
 
52
Other values (44601)
91610 
ValueCountFrequency (%) 
(nan,nan)80358.0%
 
(0.0,0.0)1690.2%
 
(40.861862,-73.91282)790.1%
 
(40.8047,-73.91243)550.1%
 
(40.820305,-73.89083000000001)520.1%
 
(40.675734999999996,-73.89686)48< 0.1%
 
(40.696033,-73.98453)48< 0.1%
 
(40.658577,-73.89063)47< 0.1%
 
(40.737784999999995,-73.93496)43< 0.1%
 
(40.733536,-73.87035)41< 0.1%
 
Other values (44596)9138391.4%
 
2020-12-13T11:07:10.325551image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique29003 ?
Unique (%)29.0%
2020-12-13T11:07:11.054934image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length39
Median length21
Mean length23.06706
Min length9

nearest_street
Categorical

HIGH CARDINALITY

Distinct27727
Distinct (%)27.7%
Missing0
Missing (%)0.0%
Memory size781.2 KiB
unspecified
26908 
3 AVENUE
 
432
BROADWAY
 
424
2 AVENUE
 
340
LINDEN BOULEVARD
 
280
Other values (27722)
71616 
ValueCountFrequency (%) 
unspecified2690826.9%
 
3 AVENUE4320.4%
 
BROADWAY4240.4%
 
2 AVENUE3400.3%
 
LINDEN BOULEVARD2800.3%
 
5 AVENUE2470.2%
 
ATLANTIC AVENUE2400.2%
 
1 AVENUE2370.2%
 
7 AVENUE2290.2%
 
PARK AVENUE2220.2%
 
Other values (27717)7044170.4%
 
2020-12-13T11:07:11.877156image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique22498 ?
Unique (%)22.5%
2020-12-13T11:07:12.634470image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length40
Median length13
Mean length19.56421
Min length1

Interactions

2020-12-13T11:05:51.715289image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:52.604673image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:53.391372image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:54.128992image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:55.989044image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:56.720560image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:57.463465image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:58.238556image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:59.012802image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:05:59.794847image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:00.593492image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:01.385788image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:02.089694image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:02.836490image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:03.706488image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:04.489311image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:05.239521image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:05.979837image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:06.733546image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:07.427842image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:08.124692image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:08.886276image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:09.893791image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:11.423359image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:12.819624image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:13.762634image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:14.373335image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:14.979456image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:15.706845image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:16.379789image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:16.954117image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:17.526830image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:18.105420image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:18.653786image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:19.200984image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:19.797416image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:20.462046image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:21.266826image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:22.474247image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:23.531131image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:24.283782image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:24.918380image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:25.582175image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:26.284977image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:26.984777image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:27.790927image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:28.560022image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:29.230301image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:29.902789image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2020-12-13T11:07:13.345899image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-13T11:07:14.359011image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-13T11:07:15.352321image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-13T11:07:16.474523image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-13T11:07:17.652754image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-13T11:06:32.445849image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-13T11:06:37.067591image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Sample

First rows

crash_datecrash_timeboroughzip_codelocationon_street_namenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_cyclist_injurednumber_of_cyclist_killednumber_of_motorist_injurednumber_of_motorist_killedcontributing_factor_vehicle_1contributing_factor_vehicle_2contributing_factor_vehicle_3collision_idvehicle_type_code_1vehicle_type_code_2vehicle_type_code_3crash_daycrash_monthcrash_yearcombine_locationnearest_street
02019-08-0317:25STATEN ISLAND10307(40.501465, -74.24523)SWINNERTON STREET00000000UnspecifiedUnspecifiedUnspecified4182249Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleSaturday82019(40.501465,-74.24523)CLERMONT AVENUE
12019-09-070:36STATEN ISLAND10307(40.50331, -74.237465)SPRAGUE AVENUE00000000Unsafe SpeedUnspecifiedUnspecified4201115Station Wagon/Sport Utility VehicleSedanStation Wagon/Sport Utility VehicleSaturday92019(40.50331,-74.237465)unspecified
22019-08-1715:00STATEN ISLAND10307(40.503387, -74.24883)FINLAY STREET00000000UnspecifiedOther_factorOther_factor4198160SedanOther_codeOther_codeSaturday82019(40.503387,-74.24883)unspecified
32017-05-062:55STATEN ISLAND10307(40.503414, -74.24496)unspecified00000000Alcohol InvolvementUnspecifiedUnspecified3664377Station Wagon/Sport Utility VehicleSedanSedanSaturday52017(40.503414,-74.24495999999999)463 MAIN STREET
42020-07-0820:20STATEN ISLAND10307(40.50447, -74.243454)HYLAN BOULEVARD00000000Failure to Yield Right-of-WayUnspecifiedOther_factor4327159SedanSedanOther_codeWednesday72020(40.50447,-74.243454)unspecified
52020-07-0617:00STATEN ISLAND10307(40.504482, -74.24727)unspecified10100000UnspecifiedOther_factorOther_factor4326628Station Wagon/Sport Utility VehicleOther_codeOther_codeMonday72020(40.504482,-74.24727)171 CARTERET STREET
62019-07-0912:50STATEN ISLAND10307(40.505527, -74.23819)HYLAN BOULEVARD20000020Following Too CloselyUnspecifiedOther_factor4167167Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleOther_codeTuesday72019(40.505527,-74.23819)SPRAGUE AVENUE
72019-07-2016:30STATEN ISLAND10307(40.506187, -74.2349)HYLAN BOULEVARD10000010Failure to Yield Right-of-WayUnspecifiedOther_factor4173987Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleOther_codeSaturday72019(40.506187,-74.2349)JOLINE AVENUE
82020-03-154:10STATEN ISLAND10307(40.506187, -74.2349)JOLINE AVENUE10100000Driver Inattention/DistractionOther_factorOther_factor4307254SedanOther_codeOther_codeSunday32020(40.506187,-74.2349)HYLAN BOULEVARD
92020-07-187:26STATEN ISLAND10307(40.506187, -74.2349)HYLAN BOULEVARD10000010Driver Inattention/DistractionOther_factorOther_factor4331848SedanOther_codeOther_codeSaturday72020(40.506187,-74.2349)JOLINE AVENUE

Last rows

crash_datecrash_timeboroughzip_codelocationon_street_namenumber_of_persons_injurednumber_of_persons_killednumber_of_pedestrians_injurednumber_of_pedestrians_killednumber_of_cyclist_injurednumber_of_cyclist_killednumber_of_motorist_injurednumber_of_motorist_killedcontributing_factor_vehicle_1contributing_factor_vehicle_2contributing_factor_vehicle_3collision_idvehicle_type_code_1vehicle_type_code_2vehicle_type_code_3crash_daycrash_monthcrash_yearcombine_locationnearest_street
999902019-11-1114:30QUEENS11354unspecified30 AVENUE00000000Turning ImproperlyUnspecifiedOther_factor4239398SedanOther_codeOther_codeMonday112019(0.0,0.0)COLLEGE POINT BOULEVARD
999912019-11-104:37QUEENS11103unspecifiedunspecified00000000Alcohol InvolvementUnspecifiedOther_factor4238627SedanStation Wagon/Sport Utility VehicleOther_codeSunday112019(0.0,0.0)28-42 37 STREET
999922019-11-120:50BROOKLYN11211unspecifiedBROADWAY10100000Driver Inattention/DistractionOther_factorOther_factor4240012Other_codeOther_codeOther_codeTuesday112019(0.0,0.0)MARCY AVENUE
999932019-11-1121:00MANHATTAN10002unspecifiedunspecified00000000Driver Inattention/DistractionUnspecifiedOther_factor4239386Station Wagon/Sport Utility VehicleOther_codeOther_codeMonday112019(0.0,0.0)128 PITT STREET
999942019-11-104:15BROOKLYN11226unspecifiedFLATBUSH AVENUE50000050Other VehicularTraffic Control DisregardedOther_factor4238152SedanStation Wagon/Sport Utility VehicleOther_codeSunday112019(0.0,0.0)SNYDER AVENUE
999952019-11-1123:11BROOKLYN11229unspecifiedKINGS HIGHWAY00000000UnspecifiedUnspecifiedOther_factor4239412SedanSedanOther_codeMonday112019(0.0,0.0)OCEAN AVENUE
999962019-11-103:46QUEENS11377unspecifiedROOSEVELT AVENUE10000010UnspecifiedOther_factorOther_factor4239029Station Wagon/Sport Utility VehicleOther_codeOther_codeSunday112019(0.0,0.0)58 STREET
999972019-11-1121:40BROOKLYN11221unspecifiedunspecified00000000Driver Inattention/DistractionUnspecifiedOther_factor4239762SedanSedanOther_codeMonday112019(0.0,0.0)829 GATES AVENUE
999982019-11-1123:30MANHATTAN10013unspecifiedunspecified00000000Passing Too CloselyUnspecifiedOther_factor4239642Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleOther_codeMonday112019(0.0,0.0)9 CROSBY STREET
999992019-11-103:30QUEENS11419unspecifiedunspecified00000000Driver Inattention/DistractionUnspecifiedOther_factor4238994SedanOther_codeOther_codeSunday112019(0.0,0.0)134-30 ATLANTIC AVENUE